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TIME  SERIES,  STATISTICS,  AND  INFORMATION 

by 

Emanuel  Parzen^ 

Department  of  Statistics,  Texas  A&M  University 

Abstract 

^.This  paper  is  a  broad  survey  of  ideas  for  the  future  development  of  statistical  meth- 
ods  of  time  series  analysis  based  on  investigating  the  many  levels  of  relationships  between 
time  series  analysis,  statistical  methods  unification,  and  inverse  problems  with  positivity 
constraints.  It  is  hoped  that  developing  these  relations  will:  help  integrate  old  and  new 
directions  of  research  in  time  series  analysis;  provide  research  tools  for  applied  and  theo¬ 
retical  statisticians  in  the  1990’s  and  coming  era  of  statistical  information;  make  possible 
unification  of  statistical  methods  and  the  development  of  Statistical  Culture.  New  results 
include  a  new  information  divergence  between  spectral  density  functions.  Topics  discussed 
include: 

1)  Traditional  entropy  and  cross-entropj^ 

2)  Renyi  and  Chi-square  information  divergence^ 

3.)  Comparison  density  functions^ 

4)  Approximation  of  positive  functions  (density  functions)  by  minimum  information  di¬ 
vergence  (maximum  entropy)^* 

5.)  Equivalence  and  Orthogonality  of  Normal  Time  Series^ 

6)  Asymptotic  Information  of  Stationary  Normal  Time  Serie^ 

7)  Estimation  of  Finite  Parameter  Spectral  Densities^ 

s)  Minimum  information  estimation  of  spectral  densities  and  power  index  correlation^ 


9)  Tail  classification  of  probability  laws  and  spectral  densities '  ^ 

/ 


10)  Sample  Brownian  Bridge  exploratory  analysis  of  time  series. 


/; 


K  P 


^Preliminary  version  of  paper  to  be  presented  at  Workshop  in  New  Directions  in  Time 
Series  Analysis  at  Institute  for  Mathematics  and  Its  Applications,  University  of  Minnesota 
on  July  16,  1990,  a  day  dedicated  to  honor  John  Tukey’s  contributions  to  Time  Series 
Analysis.  Research  supported  by  U.  S.  Army  Research  Office. 
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0.  Introduction 


The  general  level  of  the  current  relation  (or  non-relation)  between  statistics  and  time 
series  analysis  is;  (1)  many  applied  statisticians  are  ignorant  about  the  theory  of  time  series 
analysis,  (2)  many  departments  of  statistics  offer  almost  no  courses  in  time  series  analysis 
(often  relying  on  courses  taught  in  economics  or  engineering  departments),  (3)  theoretical 
statisticians  traditionally  have  regarded  time  series  analysis  as  safe  to  ignore  because  it  is 
a  “technical”  subject  in  which  it  is  difficult  to  confront  basic  issues  of  statistical  inference 
which  are  the  problems  about  which  they  want  to  do  research. 

Time  series  models  are  becoming  of  research  interest  to  some  theoretical  statisticians 
whose  primary  research  areas  involve  statistical  analysis  of  data  obeying  the  classical  model 
of  independent  observations.  They  would  like  to  investigate  the  extension  of  their  work  to 
data  obeying  probability  models  of  dependence.  This  relation  of  time  series  analysis  and 
statistics  has  a  possibility  of  being  superficial  because  it  is  completely  methods-driven, 
rather  than  problem-driven.  Consequently  it  may  not  handle  problems  that  are  of  real 
interest  to  applied  users  of  time  series  analysis.  Questions  of  asymptotic  rate  of  convergence 
of  parameter  estimators  are  technical  problems  which  fail  to  treat  bzisic  problems  (such  eis 
model  identification  and/or  non- regular  estimation  for  long-memory  or  non-Gaussian  time 
series)  that  are  usually  the  central  problems  in  a  time  series  analysis. 

I  propose  that  the  narrow  reasons  why  statisticians  should  learn  about  the  methods 
of  time  series  analysis  (they  are  important  for  applications  and  many  potential  clients 
have  time  series  problems)  should  be  supplemented  by  broad  information  age  reasons; 
the  development  of  Statistical  Culture  requires  that  statisticians  should  learn  about  the 
theory  of  time  series  because  it  will  help  them  improve  their  mastery  of  the  basic  methods 
of  statistical  analysis  for  traditional  data  consisting  of  independent  observations.  The 
theory  of  time  series  analysis  needs  to  become  txoterxc  (belonging  to  the  outer  or  less 
initiate  circle)  as  well  as  taottric. 

My  concept  of  Statistical  Culture  (Parzen  (1990))  proposes  that  a  statistical  analysis 
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should  aim  to  provide  not  a  single  answer  but  a  choice  of  answers  (answers  by  several 
methods  for  the  same  problem).  Therefore  a  framework  for  comparing  answers  is  required. 
A  framework  should  also  provide  ways  of  thinking  for  classical  problems  that  extends  to  as 
many  modern  problems  as  possible.  This  paper  discusses:  the  basic  ideas  of  a  unification 
of  statistical  methods  in  terms  of  information  concepts,  the  relation  between  time  series 
and  statistics  in  terms  of  their  relations  with  information  statistics,  a  framework  for  time 
series  methods  in  terms  of  information  concepts,  and  suggestions  for  research  problems  in 
time  series  analysis.  The  references  aim  to  include  many  influential  papers  on  information 
methods  in  statistics;  additional  references  are  warmly  solicited. 

An  aim  of  this  paper  is  to  stimulate  discussion  of  the  mind-boggling  discovery  which 
appears  to  be  emerging  in  modern  statistical  research  and  which  I  call  I.  O.  U.  (Information, 
Optimization,  Unification).  It  appears  that  one  can  find  a  common  type  of  optimization 
and  approximation  problem  that  provides  a  link  among  almost  all  classical  and  modern 
statistical  analysis  problems!  An  inner  product  <  /,p  >  is  the  integral  of  the  product 
fg.  Let  the  information  about  an  unknown  NON-NEGATIVE  function  /  be  the  values 
of  linear  functionals  which  are  inner  products  of  /  with  specified  score  functions  Jf^:  for 
k  =  l,...,m, 

<  fyJk  >=  Tfc 

for  specified  constants  called  “moment  parameters”.  Find  a  NON-NEGATIVE  function, 
denoted 

r  or  r(ri,...,r^), 

which  among  all  functions  satisfying  the  above  constraints  minimizes  an  information  di¬ 
vergence  criterion  (which  includes  as  a  special  case  maximum  entropy).  This  problem  is 
called  an  inverse  problem  with  positivity  constraints. 

We  favor  Renyi  information  criteria  which  imply  that  /'  has  a  representation  fg :  for 
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suitable  index  A  and  parameters  (which  we  call  inverse  parameters) 

m 

(//  - 1)  /* = E 

k=l 

The  inverse  parameters  0  are  functions  of  the  moment  parameters  and  obtained  by  solving 

<  fe^Jk  >=  ‘Tk- 

Uncertainty  (probability  and  statistics)  enters  the  picture  because  one  observes  a  raw 
random  function  (denoted  f)  or  at  least  its  inner  products 

<  f~,  Jk  >=  Tk\k=l,...,  m. 

One  then  seeks  /'  which  is  non-negative  and  minimizes  the  specified  information  criterion 
among  all  functions  satisfying  <  /,A  >=  Tjt”, A:  =  Among  new  data  analytic 

tools  that  are  open  problems  for  research  are  “profile  functions” ,  defined  as  the  minimum 
value  of  the  criterion  as  a  function  of  the  moment  parameters. 

The  problems  that  need  to  be  solved  to  apply  the  foregoing  approach  to  unifying 
statistical  methods  include 

(1)  introducing  suitable  density  function  d{u),  0  <  u  <  1,  whose  estimation  underlies 
conventional  problems, 

(2)  determining  sufficient  statistics  Jjfc(u)  whose  inner  products  with  d(u)  or  d^{u)  for 
some  power  A,  are  regarded  as  most  signficantly  different  from  zero  and  therefore  provide 
the  constraints  on  the  unknown  d, 

(3)  determining  information  measures  whose  index  A  provides  a  parameter  formula 
for  d  of  the  form  d^{u)  is  a  linear  combination  of  known  functions  Jfc(u)  with  coefficients 
Olf  to  be  estimated, 

(4)  developing  and  implementing  algorithms  to  compute  the  solutions  of  the  optimiza¬ 
tion  problems. 

Other  aspects  of  unification  of  time  series  analysis  methods  were  discussed  in  Parzen 
(1958),  Parzen  (1961),  Parzen  (1965),  Parzen  (1971),  and  Parzen  (1974). 
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1.  Traditional  Entropy  and  Crosg-Entropy 

The  (Kullback-Liebler)  information  divergence  between  two  probability  distributions 
F  and  G  is  defined  (Kullback  (1959))  by  (our  definitions  differ  from  usual  definitions  by  a 
factor  of  2) 


I{F,G)  =  (-2)  f  log{g{x)/f{x)}f[x)dx 

J — oo 

when  F  and  G  are  continuous  with  probability  density  functions  /(x)  and  g{x)‘,  when  F  and 
G  are  discrete,  with  probability  mass  functions  pi?(x)  and  P(3(x),  information  divergence 
is  defined  by 


=  (-2)  J^log{pG(a;)/pp(x)}pp(x). 
An  information  decomposition  of  information  divergence  is 


I{F;G)  =  H{F;G)-H{F), 


in  terms  of  entropy  H{F)  eind  cross-entropy  H{F\G): 


H{F)  =  (-2)  /  {log/(x)}/(x)dz, 

J-oo 

H{F’,G)  =  (-2)  f  {\ogg{x)}f{x)dx. 
J -oo 


2.  Renyi  and  Chi  Square  Information 

Adapting  the  fundamental  work  of  Renyi  (1961)  this  section  offers  a  new  definition  of 
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Renyi  information  of  index  A.  For  continuous  F  and  G:  for  A  ^  0,  -1 

imF;a)=2l 

iR-i{F,G)  =  -2j  {iog7^}/(y)‘^y 

-/{-i-i*')''''" 

An  analogous  definition  holds  for  discrete  F  and  G. 

The  second  definition  provides:  (l)  extensions  to  non-negative  functions  which  are  not 
densities,  and  also  (2)  a  non-negative  integrand  which  can  provide  diagnostic  measures  at 
each  value  of  y. 

Renyi  information,  for  —1  <  A  <  0,  is  equivalent  to  Bhattacharyya  distance  (Bat- 
tacharyya  (1943)}. 

In  addition  to  Renyi  information  divergence  (an  extension  of  information  statistics) 
one  uses  as  information  divergence  between  two  non-negative  functions  an  extension  of 
chi-square  statistics  which  has  been  developed  by  Read  and  Cressie  (1988).  For  A  7^  0,  -1, 
Chi-square  divergence  of  index  A  is  defined  for  continuous  F  and  G  by 

Cx(F;G)  =  I  Ba(^) /(»)<(!/ 

where 

Bo{d)  =  2{d\ogd  -  d  +  1} 

B-i{d)  =  -2  {log  d-  d  +  1} 
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Important  properties  of  Bx[d)  are: 

BAM>0,Bj(l)=Bi(l)=0. 

=  f  (<'•'- l).  «"(<<)  = 

Bo{d)  =  2{d  log  d  —  d  +  1) 

B-.5{d)  =  4  (d-^  -  lY 
B-i{d)  =  -2(log<i-rf+l) 

B-2{d)  =  d{d-^  -  l)^ 

An  juialogous  definition  holds  for  discrete  F  and  G.  Axiomatic  derivations  of  information 
measures  similar  to  Cx  are  given  by  Jones  and  Byrne  (1990). 

The  Renyi  information  and  chi-square  divergence  measures  are  related: 

/i!o(f;(3)=Co(f;G) 

IR-l(F\G)  =  C-i(F-,G) 

For  A  ^  0,  —  1, 

=  A(I^  (^) 

Interchange  of  F  and  G  is  provided  by  the  Lemma: 

Ca(F;G)=C_,,+,)(G;F) 

IRx{F;G)  =  /Ji.(,+A)(G;f) 

Our  survey  in  this  paper  suggests  (in  section  6)  a  new  class  of  information  measures: 

Ax(F;a)  =  I  Ax{^^mdy 

For  A  0,  —  1,  perhaps  most  usefully  —  1  <  A  <  0, 

Ax{d)  =  {2/A(l  +  A)}  {(1  4-  A)  log d-  log  {1  -I-  (1  +  A)(d-  1)}^.} 
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Note  A-i{F;G)  =  IR-i{F;G), 


Ao{d)  =  2{logd-  1  +  (1/rf)}  =  {l/d)Bo{d). 


3.  Comparison  Density  Functions 

Information  divergence  I{F\G)  is  a  concept  that  works  for  both  multivariate  and 
univariate  distributions.  This  section  shows  that  the  univariate  case  is  distinguished  by 
the  fact  that  we  are  able  to  relate  I{F\G)  to  the  concept  of  comparison  density  <i(u;  F,  G), 
Quantile  domain  concepts  introduced  in  Parzen  (1979)  play  a  central  role;  Q(u)  = 
F~^(u)  is  the  quantile  function.  When  F  is  continuous,  we  define  the  density  quantile 
function /Q{u)  =  /((^(u)),  score  function  J(u)  =  —(/Q(u))',  and  quantile  density  function 

g(u}  =  1//Q(u)  =  Q'(u). 

When  F  is  discrete,  we  define  fQ{u)  =  ?/’(<?{«)),  q{u)  =  l/fQ{u). 

The  comparison  density  d(u;F,G}  is  defined  as  follows;  when  F  and  G  are  both 
continuous, 

(i(u;F,G)  =  s(f-‘(u))//(f->(u)); 
when  F  and  G  are  both  discrete 


d(u  :  F,G)=pG{F-\n))lpF{F-\u)) 


In  the  continuous  case  d(u;  F,  G)  is  the  derivative  of 


£»(u;F,G)  =  G(F-l(u)); 

in  the  discrete  case  we  define  the  comparison  distribution  function 

£)(u;F,G)=  r  d{t\F,G)dt. 

Jo 

Let  F  denote  the  true  distribution  function  of  a  continuous  random  variable  Y.  To  test 
the  goodness  of  fit  hypothesis  Hq  :  F  =  G,  one  transforms  to  IP  =  G{Y)  whose  distribution 
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function  is  F(Cj  ^(u))  and  whose  quantile  function  is  G{F  ^(u)).  The  comparison  density 
d{u]F,G)  and  d(u;G,  F)  are  respectively  the  quantile  density  and  the  probability  density 
oiW. 

For  a  density  d{u),  0  <  u  <  1,  Renyi  information  (of  index  A),  denoted  IRx{d),  is  non¬ 
negative  and  measures  the  divergence  of  d(tt)  from  uniform  density  do(«)  =  1,  0  <  u  <  1. 
It  is  defined: 

IRQ{d)=2  f  {d{u)  log  d{u)} du  =  2  f  {d(u)  logd{u)  —  d(u) -H  1}  du 

Jo  Jo 

IR-l[d)  =  —2  j  {logd(u)}du  —  —2  j  {logd(u)  —  d(u)  -h  l}du 
Jo  Jo 

for  A  ^  0  or  -1 

IRx{d)  =  {2/A(l  +  A)}  log  f 

Jo 

=  {2/A  (1  +  A)}  log  ^  ({d(u)}^+^  -  (1  +  A)  {d(tt)  -  1})  du. 

To  relate  comparison  density  to  information  divergence  we  use  the  concept  of  Renyi 
information  IRx  which  yields  the  important  identity  (and  interpretation  of  /(F;G)!) 

/(F;G)  =  (-2)  f\ogd{u;F,G)du 
Jo 

=  /R_i(d(u;F,G))  =  /i2o(<i(u;G.F)). 

For  a  density  d(u),  0  <  u  <  1,  define 

CA(<i)=  !' BMu))du. 

Jo 

The  comparison  density  again  unifies  the  continuous  and  discrete  cases.  One  can  show 
that  for  univariate  F  and  G 

Cx{F,G)  =  Cx{diu-,F,G)) 

4.  Approximation  of  positive  functions  (density  functions)  by  minimum  infor¬ 
mation  divergence  (maximum  entropy) 
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This  section  discusses  how  approximation  theory  provides  models  for  comparison  den¬ 
sity  functions.  To  a  density  d(u),  0  <  u  <  1,  approximating  functions  are  defined  by 
constraining  (specifying)  the  inner  product  between  d{u)  and  a  specified  function  J(u), 
called  a  score  function.  We  often  assume  that  the  integral  over  (0,1)  of  J{u)  is  zero,  and 
the  integral  of  ^^(u)  is  finite.  A  score  function  J(u),  0  <  u  <  1,  is  always  defined  to  have 
the  property  that  its  inner  product  with  d(u),  denoted 

[J,d\  =  [J(u),d(u)]  =  f  J{u)d{u)du, 

Jo 

is  finite.  The  inner  product  is  called  a  component  or  linear  detector,  its  value  is  a  measure 
of  the  difference  between  d{u)  and  1. 

The  question  of  which  distributions  to  choose  zis  F  and  G  is  often  resolved  by  the 
following  formula  which  evaluates  the  inner  product  between  J(u)  and  d{u;F,G)  as  a 

moment  with  respect  to  G  if  J(u)  =  v?(F“^(u)); 

/c» 

r{y)dG(y)  =  EaMY)] 

-OO 

Often  G  is  a  raw  sample  distribution  and  F  is  a  smooth  distribution  which  is  a  model 
for  G  according  to  the  hypothesis  being  tested. 

We  propose  that  non-parametric  statistical  inference  and  density  estimation  can  be 
based  on  the  same  criterion  functions  used  for  parametric  inference  if  one  uses  the  minimum 
Renyi  information  approach  to  density  estimation  (which  extends  the  maximum  entropy 
approach);  form  functions  dx^rn  (^)  w’  ich  minimize  IRx{d''{u))  among  all  functions  <F(u) 
satisfying  the  constraints 

[Jk,d^\  =  \  Jk,d\  for  fc  =  l,...,m 

where  Jjt(u)  are  specified  score  functions.  One  expects  dx^m"{u)  to  converge  to  d{u)  as  m 
tends  to  oo,  and  I Rx{dx^rn')  non-decreasingly  converge  to  IRx{d). 

The  case  A  =  1  provides  approximations  in  L2  norm  which  are  based  on  a  sequence 
Jjt(u),  k  =  1,2, . . .,  which  is  a  complete  orthonormal  set  of  functions.  If  d(u),  0  <  u  <  1, 
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is  square  integrable  (equivalently,  IRi{d)  is  finite)  one  can  represent  d(u)  as  the  limit  of 

m 

dmitt)  =  1  +  m  =  1, 2, . . . . 

k=l 

When  <Pk{y)^  ^  =  1>2,. . is  complete  orthonormal  set  for  L2(F),  a  density  g{y)  can 
be  approximated  by 

ffm(y)  =  /(y)  1 1  +  ^  Fg  [^jfc(K)]  <Pk{y)  | 

We  call  dm(u)  a  truncated  orthogonal  function  (generalized  Fourier)  series. 

An  important  general  method  of  density  approximation,  called  a  weighted  orthogonal 

function  approximation,  is  to  use  suitable  weights  Wj^  to  form  approximations 

oo 

d*(u)  =  1  +  '^Wk\Jk,d]Jk{u). 
k~l 

to  d{u).  Often  lojt  depends  on  a  “truncation  point”  m,  and  lyjt  — ^  1  as  m  oo. 

Quadratic  Detectors.  To  test  Hq  :  d{tt)  =  1,  0  <  u  <  1,  many  traditional  goodness  of 
fit  test  statistics  (such  as  Cramer-von  Mises  and  Anderson-Darling)  can  be  expressed  as 
quadratic  detectors 


oo  -1 

/  {<i‘(u)-l}*<<t.  =  e,(i’)  =  -l  +  exp/«,(0. 

We  propose  that  these  nonadaptive  test  statistics  should  be  expressed  as  information  mea¬ 
sures  and  compared  with  minimum  Renyi  information  detectors  IRx{dx,m’^)y  way 

information  can  provide  unification  of  statistical  methods. 

Maximum  entropy  approximators  correspond  to  A  =  0;  do,m''(*t)  satisfies  an  exponen¬ 
tial  model  (whose  parameters  are  denoted  . . . ,  9m) 

m 

logdo,m'(u)  =  'Y^^kJkW)  -  ^{9i,...,9m) 

k=l 

where  ^  is  the  integrating  factor  that  guarantees  that  do,m”(“)»  0  <  u  <  1,  integrates  to 


1; 


'I' (01,...,  0m)  =  log 


1  du 
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The  approximating  functions  formed  in  practice  are  not  computed  from  the  true  com¬ 
ponents  \Jk,d]  but  from  raw  estimators  for  suitable  raw  estimators  <r(u).  The 

approximating  functions  are  interpreted  as  estimators  of  a  true  density.  Methods  proposed 
for  unification  and  generalization  of  statistical  methods  use  minimum  Renyi  information 
estimation  techniques.  Different  applications  of  these  methods  differ  mainly  in  how  they 
define  the  raw  density  <r(u)  which  is  the  starting  point  of  the  data  analysis. 

5.  Equivalence  and  Orthogonality  of  Normal  Time  Series 

This  section  formulates  in  terms  of  Renyi  information  some  classic  results  of  the  theory 
of  time  series  that  should  be  part  of  the  education  of  Ph.D.’s  in  statistics. 

To  apply  Neyman  Pearson  statistical  inference  to  a  time  series  {Y{t),teT}  with 
abstract  index  set  T,  we  must  first  define  the  probability  density  functional,  denoted 
p(y  (•);  5).  We  assume  a  family  of  probability  models  (for  the  time  series)  parametrized  by 
a  possibly  infinite  dimensional  parameter  6. 

The  model  “y(.)  is  normal  with  known  covariance  kernel  K{s,t)  =  cov{y(s),y(t)} 
and  unknown  mean  value  function  m(t)  =  E(y(f)]’’  is  equivalent  to  a  probabability  mea¬ 
sure  Pm  on  the  function  space  Rj  of  ail  functions  on  T.  We  define  p(Y'(.);m)  to  be  the 
Radon-Nikodym  derivative  of  Pm  with  respect  to  Pq,  the  probability  measure  correspond¬ 
ing  to  m(t)  =0  for  all  t. 

Theorem  (Parzen  (1958)):  In  order  that  Pm  be  absolutely  continuous  with  respect  to 
Pq  it  is  necessary  and  sufficient  that  m  is  in  H{K),  the  reproducing  kernel  Hilbert  space 
of  functions  on  T  with  reproducing  kernel  K  and  inner  product  between  functions  /  and 
g  in  H{K)  denoted  <  f,g  >h(K)  ^■nd  satisfying 

<  >h(K)=  /(O 

for  every  /  in  H{K)  and  t  in  T. 

The  probability  measures  Pm  and  Pq  of  normal  time  series  are  either  equivalent  or 
orthogonal;  when  they  have  the  same  covariance  kernel  K  and  mean  value  functions  dif- 
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fering  by  m  they  are  orthogonal  if  and  only  if  ||  m||  ff(K)  =  “id  they  are  equivalent  if 
and  only  if  ||  ml|  ^(x)  is  finite  [which  is  equivalent  to  m  is  a  member  of  H{K)\  and 

logp(y(.);m)  =<  y,m  >^h[K)  -'Si!  m|| 

The  random  variable  <  K,  m  >~H(k)  ^  “congruence  inner  product”  (Parzen  (1970);  it  is 

the  linear  combination  of  {y(t),t  in  T}  corresponding  to  m  under  the  congruence  which 
maps  K{-,t)  into  Y(t). 

To  compute  the  Renyi  information  of  index  A  let  a  =  jj  m||  X(X)»  <  >~X(X)  = 

o'/y ,  where  Z  denotes  a  Normal  (0,1)  random  variable.  Then 

IRx{Po,Pm)  =  {2/A(l  +  A)}  log  Int, 

where 

Int=  f  {p(y(.);m}^+^dPo 
JRt 

=  f?[exp{(l  +  X){aZ  -  .5a^)}] 

=  exp{.5(l  -f  —  .5(1  +  A)a^} 

=  exp{.5A(l  +  A)a^} 

Theorem:  Renyi  information  of  two  common  covariance  normal  time  series: 

IRx{Po\Pm)  =(^'^  =  \\  m|| 

This  beautiful  formula  illustrates  that  our  definition  of  Renyi  information  has  been  adjusted 
to  be  equivalent  to  a  chi-squared  statistic. 

Method  of  proof  uses  limits  of  information  numbers:  To  prove  results  about  equiva¬ 
lence  and  orthogonality  one  studies  the  limit  of  information  measures  of  finite  dimensional 
restrictions  Pm^  representing  probability  measures  of  y(t),  t  in  a  finite  set  of  points 
in  T  converging  monotonely  to  a  set  r(°°)  dense  in  T.  The  norm  of  the  restriction  of  m 
to  denoted  ||  m||  shown  to  converge  to  |(  m|(  h{K)'  ^ 
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series  {Y{t),t  in  T}  with  abstract  index  set  T,  the  finite  dimensional  Renyi  information  is 
denoted 


Theorem.  General  martingale  theory  can  be  used  to  show  that 

(1)  Pq  Pm  are  orthogonal  if  and  only  if  /R_.5  converges  to  oo  as  n  increases, 

(2)  P m  is  absolutely  continuous  with  respect  to  Pq  if  /i2|  ’  has  a  finite  limit  as  n  increases. 
Proposition:  Renyi  information  divergence  of  two  zero  mean  univariate  normal  dis¬ 
tributions.  Let  Pj  be  the  distribution  on  the  real  line  corresponding  to  Normal  {0,Kj) 
with  variance  Kj.  Let  p(t/)  denote  the  probability  density  of  Pj  with  respect  to  P2.  Let 
K  =  ^.  Then 


t{y)  =  /c-^exp  |-.5(ic  -  1)^  J 
IR-i{P2',Pi)  =  <c  -  I  -  logic, 

IRxiP2-.Pl)  =  (l/A){logic  -  (1  +  A)-'  log{l  +  (1  +  A)(x  -  1)}+} 
Cx  iP2-,  Pi)  =  {2/A(l  +  A)}  ic  *(l+*>  {1  +  (1  +  A)(x  -  1));  * 


Asymptotic  information  can  be  computed  from  the  fact  (compare  Hannan  (1970),  p. 


429) 


limn- 


»oo| 


m 


1 


where  M{u)  is  the  asymptotic  spectral  distribution  function  of  m(*)  Msuming  that  m(-) 
obeys  Grenander’s  conditions,  or  is  “persistently  exciting”  in  the  language  of  control  en¬ 
gineers  (Bohlin  (1971)): 


Pm(v)  =  /' 
Jo 


where  Pm(v)  is  limit  of  sample  autocorrelations  of  m(-). 


14 


6.  Asymptotic  Information  of  Stationary  Normal  Time  Series 

This  section  discusses  unification  of  information  measures  of  stationary  normal  time 
series  and  information  measures  of  non-negative  functions  which  are  spectral  density  func¬ 
tions. 

When  a  time  series  =  1,2,...}  is  modeled  by  probability  measures  which 

are  orthogonal  over  the  infinite  sequence  but  equivalent  for  any  finite  sample,  we  define 
asymptotic  information  divergence  (or  rate  of  information  divergence) 

AsymIRA(P2;Pl)  =  Hm  (l/n)IR;^(pj^^; 

n — >00 

Let  Y  (.)  be  zero  mean  normal  stationary  with  covariance  function 

R(v)  =  E{Y{t)Y{t  -  «)). 

The  correlation  function  is  defined 


p(v)  =  R{v)/R{0). 


We  prefer  to  analyze  the  time  series  after  first  subtracting  sample  mean  and  dividing  by 
its  sample  standard  deviation;  its  covariance  function  asymptotically  equals  p{v). 

An  important  classification  of  time  series  is  by  memory  type:  no  memory,  short  mem¬ 
ory,  long  memory  according  as  /qo  =  0,  0  <  /o©  <  oo,  /qo  =  oo  where 


/oo  =  /(y|y-i,  y-2, . . .)  =  ■  I  (/y|r_i,y_2, .  •  ■ ;  fy) 

is  the  information  about  Y (t)  in  T (t  -  1),  -  2), . . .,  its  infinite  past  (see  Parzen  (1981), 

(1983)). 

Assume  that  y’(.)  is  short  memory  and  satisfies 

oo 

|P(v)|  finite. 

V=— OO 

The  spectral  density  function  /(w),  0  <  u;  <  1,  is  defined  as  the  Fourier  transform  of 
the  correlation  function: 

oo 

fi^)  =  YL  exp(-27nW)p(v) 
v=-oo 
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We  call  a  time  series  bounded  memory  if  the  spectral  density  is  bounded  above  and  below: 


0  <  cj  <  /(w)  <  C2  <  oo. 

Let  Pf  denote  the  probability  measure  on  the  space  of  infinite  sequences  iZoo  cor¬ 
responding  to  a  normal  zero  mean  stationary  time  series  with  spectral  density  function 
/(w). 

A  result  of  Pinsker  [(1964),  p.  196]  can  be  interpreted  as  providing  a  formula  for 
asymptotic  information  divergence  between  two  zero  mean  stationary  time  series  with 
respective  rational  spectral  density  functions  f{u})  and  5(w).  Write  AsymIR;^(/,y)  for 
AsymIR;^(P^;  Pg).  Adapting  Pinsker  (1964)  one  can  prove  that 

AsymIR-i(/,^)  =  f  {(/(w)/<7(t*;))  -  1  -  log(/(u;)/g(u;))}  dw 

Jo 

The  definition  of  Renyi  information  can  be  extended  to  non-negative  functions  d(u) 
which  do  not  necessarily  integrate  to  1.  Because  spectral  densities  are  even  functions  we 
take  the  integral  to  be  over  0  <  w  <  .5.  One  obtains  the  following  important  theorem. 

Theorem:  Unification  of  information  measures  of  Pinsker  (1964)  and  Itakura-Saito 
(1970). 

AsymIR_i(/,5)  =  /P-i(/(w)/y(u;))o,.5 

The  validity  of  this  information  measure  can  be  extended  to  non-normal  asymptotically 
stationary  time  series  (Ephraim,  Lev-Ari,  Gray  (1988)). 

One  can  heuristically  motivate  Pinsker’s  information  theoretic  justification  of  the 
Itakura-Saito  distortion  measure  by  the  formula  (at  the  end  of  section  5)  for  the  infor¬ 
mation  divergence  between  two  univariate  normal  distributions  with  zero  means  and  dif¬ 
ferent  variances.  Motivated  by  this  formula  we  propose  a  formula  for  bounded  memory 
time  series  (whose  proof  is  given  by  Kazakos  and  Kazakos  (1980))  which  motivates  a  new 
distortion  measure:  AsymIR;^(/,g)  = 

(1/A)  /  {log(/(w)/<7(u;))  -  (1/(1  +  A))log{l  +  (1  +  '0((/(u;)/<7(w))  -  l}+}d«; 

Jo 
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The  properties  of  the  integrand  are  the  same  as  those  of  Bx{d). 

Kazakos  and  Kazakos  (1980)  also  give  formulas  for  asymptotic  information  of  multiple 
stationary  time  series. 

7.  Estimation  of  Finite  Parameter  Spectral  Densities 

This  section  formulates  in  terms  of  Renyi  information  the  classic  asymptotic  maximum 
likelihood  Whittle  theory  of  spectral  estimation. 

For  a  random  sample  of  a  random  variable  with  unknown  probability  density  /,  max¬ 
imum  likelihood  estimators  0"  of  the  parameters  of  a  finite  parameter  model  fg  of  the 
probability  density  /  can  be  shown  to  be  equivalent  to  minimizing 

iR-iif-Jn) 

where  f"  is  a  raw  estimator  of  /  (initially,  a  symbolic  sample  probability  density  formed 
from  the  sample  distribution  function  F~).  A  similar  result,  called  Whittle’s  estimator 
(Whittle  (1953)),  holds  for  estimation  of  spectral  densities  of  a  bounded  memory  zero 
mean  stationary  time  series  for  which  one  assumes  a  finite  parametric  model  fg  (w)  for  the 
true  unknown  spectral  density  /(w). 

A  raw  fully  nonparametric  estimator  of  /(w)  from  a  time  series  sample  F'(t),  t  = 
1, . . . ,  n,  is  the  sample  spectral  density  (or  periodogram) 

r(w)  =  I  J]]y(0exp(-27ria;t)|2-^^|r(t)|2 
t=l  t=l 

Note  that  /"(w)  is  not  a  consistent  estimator  of  /(w);  nevertheless, 

E[f'{u})\  converges  to  /(w), 

a  fact  which  can  be  taken  as  the  definition  of  /(w). 

An  estimator  0''  which  is  asymptotically  equivalent  to  maximum  likelihood  estimator 
is  obtained  by  minimizing  AsymIR_i(/~; /^)  =  fe)o,.5  = 

/  UrH/feH)  -  1  -  duj 

Jo 
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/•(w) 

which  caji  be  interpreted  as  choosing  0  to  make  ,  )  ^  as  fiat  or  constant  as  possible. 

feH 

We  usually  use  the  representation 

where  7$(w)  is  the  square  modulus  of  the  transfer  function  of  the  whitening  filter  repre¬ 
sented  by  the  spectral  density  model  fg ,  constructed  so  that 


r  1 

/  log  =  log  Cf^. 

Jo 

Minizimizing  AsymIR_i(/', /^)  is  equivalent  to  minimizing 

{r(whg(u;)}  du)  -t-  log(a2) 


which  is  equivalent  to  minimizing  over  0 


and  setting 


Jo 

cr'2  =  f  ><ig.{uj)f’{u)duj  =  aj. 

Jo 


The  information  divergence  between  the  data  and  the  fitted  model  is  given 


IR-l{r,fg-)  =  log<7|-  -  loga"^  =  Ioo~  -  h 


defining  -/qo"  =  loga*^, 


-Ioo~  =  log  <7* 


-i: 


log  f~{u>)d(j 


This  criterion  (however,  corrected  for  bias  in  loo  )  arises  from  information  approaches 
to  model  identification  (Parzen  (1983)).  A  model  fitting  criterion  (but  not  a  parameter 
estimation  criterion)  is  provided  by  the  information  increment 

I{Y  I  all  past  Y ;  Y  values  in  model  0) 

=  C -\og{rMlf,M)  =  /«-l  (/7/||-)o,-.5 
Jo 


One  can  regard  it  as  a  measure  of  the  distance  of  the  whitening  spectral  density 

from  a  constant  function;  note  that  /*(w)  is  constructed  to  integrate  to  1.  When  one  ac¬ 
cepts  that  the  optimal  smoother  of  /*(w)  is  a  constant,  a  “parameter-free”  non-parametric 
estimator  of  the  spectral  density  /(w)  by  a  smoother  of  /~(u>)  is  given  by  the  parametric 
estimator  By  “parameter-free”  we  mean  that  we  are  free  to  choose  the  parameters 
to  make  the  data  (raw  estimator)  shape  up  to  a  smooth  estimator.  The  parameters  are 
not  regarded  as  having  any  significance  or  interpretation;  they  are  merely  coefficients  of  a 
representation  of  /(w). 

Portmanteau  statistics  to  test  goodness  of  fit  of  a  model  to  the  time  series  use  sums 
of  squares  of  correlations  of  residuals;  an  analogous  statistic  is 

ini  (r//»-)o,.5  = 

Jo 

Goodness  of  fit  of  the  model  to  the  data  (as  measured  by  how  close  /*(w)  is  to  the  spectral 
density  of  white  noise)  is  the  ultimate  model  identification  criterion  to  decide  between 
competing  parametric  models. 

8.  Minimum  information  estimation  of  spectral  densities 

This  section  provides  a  perspective  on  maximum  entropy  spectral  estimation  from  the 
point  of  view  of  minimum  Renyi  information  approximation. 

The  mziximum  entropy  approach  to  the  problem  of  spectral  estimation  of  a  stationary 
time  series  was  originated  by  Burg  (1967).  It  derives  a  parametric  model  fg  for  the  true  / 
by  imposing  constraints  on  linear  functionals  of  /  of  the  form:  for  v  =  0, 1, . . . ,  m 

p(v)  =<  exp(27riuu;),/(w)  >^3(0,1)=  p“(v) 

where  p~{v)  are  estimators  (from  a  sample)  of  autocorrelations.  Note  that  the  remarkable 
properties  of  Burg’s  estimators  derive  from  the  fact  that  he  first  estimates  in  a  novel 
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way  the  partial  correlations  and  should  not  be  interpreted  as  proof  of  the  superiority  of 
maximum  entropy  philosophy. 

Let  f'm  denote  the  function  (among  all  functions  /  satisfying  these  constraints)  which 
minimizes  the  neg-entropy  (of  order  -1) 


-  {log/(w)}du; 


The  solution  /*m  has  the  following  parametric  form: 


{l//m*}  is  linear  combination  of  exp(27rfvu;),  v  =  — m, . . . ,  m 


The  non-negativity  of  /*  then  guarantees  that  /“  is  an  autoregressive  spectral  density: 

m 

rmH  =  <T^|^a(y)exp(27rtju;)|"2 

y=o 

A  negative  opinion  about  applying  autoregressive  spectral  estimates  is  expressed  by 
Diggle  (1990),  p.  112:  “A  final  method,  which  we  mention  only  briefiy,  is  to  fit  an  AR(p) 
process  to  the  data  {yt}  and  to  use  the  fitted  autoregressive  spectrum  as  the  estimate  of 
/(ur).  The  motivation  for  this  is  threefold:  fitting  an  autoregressive  process  is  computa¬ 
tionally  easy,  autoregressive  spectra  can  assume  a  wide  variety  of  shapes,  and  automatic 
criteria  are  available  for  choosing  the  value  of  p.  Nevertheless,  the  method  seems  to  fit 
uneeisily  into  a  discussion  of  what  is  essentially  a  non-parametric  estimation  problem.  It  is 
analogous  to  the  use  of  polynomial  regression  for  data  smoothing,  and  is  open  to  the  same 
basic  objection,  namely  that  it  imposes  global  assumptions  which  can  lead  to  artefacts  in 
the  estimated  spectrum.” 

If  one’s  criterion  is  to  minimize 

IRo{f)o,.5=  /'{/(u;)log/(u;)}da;, 

Jo 

the  neg-entropy  of  order  0,  the  solution  /“  obeys  an  exponential  model  (Bloomfield  (1973)): 


{log  /"}  is  linear  combination  of  exp(2?rtW),  t/  =  — m, . . . ,  m 
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These  optimization  problems  are  related  to  the  problem:  subject  to  the  constraints, 
with  specified  score  functions  Jidu)), 

<  f,Jk  >=  specified  constant  for  k  =  1, . . . ,  m 

find  a  density  /(w),0  <  w  <  1,  minimizing  the  Lp  norm,  with  p  =  1  +  A, 

Jo 

Theorems  about  this  problem  are  given  by  Chui,  Deutsch,  and  Ward  (1990). 

Power  correlations,  inverse  and  eepstral  correlations:  Parametric  models  for  the  spec¬ 
tral  density  /  can  be  obtained  from  various  maximum  entropy  criteria.  To  check  which 
model  is  parsimonious,  one  requires  goodness  of  fit  procedures  which  check  the  significant 
difference  from  zero  of  the  Fourier  transforms  of  various  functions  of  /  such  as  l//,  log/, 
or 

Let  ef)(cLf)  —  exp(27r»7a;),  and  interpret  inner  products  as  ^2(0,1).  Define: 
inverse  correlations 

p(-i)(t;)  =<  c„,l//  > 

eepstral  correlations 

p(0)(w)  =<  CvJogf  > 

ordinary  correlations 

p(l)(u)  =<  e„,/  > 

power  correlations  of  power  A 

=<  Cv,/^  > 

Identification  of  a  parametric  model  for  /  should  include  routine  estimation  and  interpre¬ 
tation  of  these  various  correlations. 

In  general  if  one  expands  /"^(w)  as  a  linear  combination  of  orthogonal  functions  Jjfc(u), 
0  <  w  <  1,  one  forms  the  transform  (called  power  orthogonal  series  coefficients) 

=<  4,/*  > 
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An  open  research  problem  is  identification  of  appropriate  orthogonal  functions  Jfc(u), 

0  <  w  <  1. 

9.  Tail  Classification  of  Probability  Laws  and  Spectral  Densities. 

This  section  discusses  models  for  spectral  density  functions  which  are  based  on  their 
analogy  with  quantile  density  functions. 

From  extreme  value  theory,  statisticians  have  long  realized  that  it  is  useful  to  classify 
distributions  according  to  their  tail  behavior  (behavior  of  F(x)  as  x  tends  to  ±oo).  It  is 
usual  to  distinguish  three  main  types  of  distributions,  called  (1)  limited,  (2)  exponential, 
and  (3)  algebraic.  Parzen  (1979)  proposes  that  this  classification  be  expressed  in  terms  of 
the  density  quantile  function  /Q(u);  we  call  the  types  short,  medium,  and  long  tail. 

A  reasonable  assumption  about  the  distributions  that  occur  in  practice  is  that  their 
density-quantile  functions  are  regularly  varying  in  the  sense  that  there  exist  tail  exponents 
ao  and  ai  such  that,  as  u  — ►  0, 

/(5(u)  =  u®°Lo(u),  /Q(l  -  u)  =  u“‘Li(u) 

where  Lj{u)  for  j  =  0, 1  is  a  slowly  varying  function. 

A  function  L{u),  0  <  u  <  1  is  usually  defined  to  be  slowly  varying  as  a  0  if,  for 
every  y  in  0  <  y  <  1,  L(yu)/L(u)  — »  1  or  logi/(yu)  —  logL(u)  — 0.  For  estimation  of  tail 
exponents  we  will  require  further  that,  as  u  0, 

f  {log  L{uy) -log  L{u)}dy -*0 

Jo 

which  we  call  integrally  slowly  varying.  An  example  of  a  slowly  varying  function  is  L(u)  = 
{logtt~^}^. 

Classification  of  tail  behavior  of  probability  laws.  A  probability  law  hcis  a  left  tail  type 
and  a  right  tail  type  depending  on  the  value  of  oq  and  aj.  If  a  is  the  tail  exponent,  we 
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define: 


a  <  0  super  short  tail 

0  <  a  <  1  short  tail 

a  =  1  medium  tail 

a  >  1  long  tail 

Medium  tailed  distributions  are  further  classified  by  the  value  of  J*  =  UmJ(u): 

a  =  1,  J*  =  0  medium  long  tail 
O!  =  l,0<J*<oo  medium-medium  tail 
a  =  1,  J*  =  oo  medium-short  tail 

One  immediate  insight  into  the  meaing  of  tail  behavior  is  provided  by  the  hazard 
function  h(x}  =  /(i)  d-  {1  —  F(i)}  with  hazard  quantile  function  hQ(u)  =  fQ{u)  -i- 1  — 
The  convergence  behavior  of  h{x)  as  x  -*  oo  is  the  same  as  that  of  hQ(u)  as  u  -*  1.  From 
the  definitions  one  sees  that  h*  =  lima:-.oo  ^(x)  satisfies 

h*  =  oo (increasing  hazard  rate)  Short  or  medium-short  tail 
0  <  h*  <  oo  (constant  hazard  rate)  Medium- medium  tail 

h*  =  0 (decreasing  hazard  rate)  Long  or  medium-long  tail 

Formulas  for  computing  tail  exponents.  The  representation  of  fQ{u)  suggests  a  for¬ 
mula  for  computation  of  tail  exponents  oq  and  ai  (which  may  be  adapted  to  provide 
estimators  from  data): 

-ao  =  lim  f  {log  fQ{uy)  -  log  fQ{u)}  dy 
u-*0Jq 

ai  =  liin  [  {log  fQ{l  ~  yu)  -  log  fQ{l  -  u)}  dy 
u-^OJq 

Memory  classification  of  spectral  densities:  Spectral  densities  with  no  poles  or  zeroes 
represent  time  series  with  bounded  memory.  We  regard  spectral  density  functions  as 


analogous  to  quantile  density  functions.  A  model  for  a  spectral  density  with  a  pole  or  zero 
at  zero  frequency  (a  similar  representation  holds  for  an  arbitrary  frequency  wq)  is  (Parzen 
(1986)) 


/(w)  =  u)  ^ L{u}) 


where  L  is  a  slowly  varying  function  at  a;  =  0  and  L(0)  >  0.  An  important  role  is  played 
by  /(1/n)  =  n^L{lfn). 

The  spectral  density  is  integrable  if  5  <  1,  which  is  the  condition  for  stationarity.  The 
spectral  density  of  a  non-stationary  time  series  needs  careful  definition.  The  case  ^  =  1 
is  of  particular  interest;  it  corresponds  to  “l//”  noise.  The  case  ^  >  1  could  be  called 
“fractal  noise” .  A  time  series  whose  first  difference  is  stationary  has  S  -  2.  Heuristically, 
6  is  interpreted  for  a  zero  mean  time  series  V'(-)  by 


t=i 


grows  as  n° L{l/n). 


The  index  6  associated  with  frequency  w  is  interpreted: 


E 


J^exp(27rtw<)y(t)l2 

Tv 


t=i 


grows  as  rrL{l/n), 


and,  when  5  =  0,  converges  to  R{0)f{u)  if  it  is  finite. 

This  approach  provides  definitions  of  spectral  density  for  asymptotically  stationary 
time  series  (Parzen  (1962)). 

Note  that  a  finite  dynamic  range  (bounded  memory)  spectral  density  has  5  =  0,  but 
5  =  0  does  not  imply  finite  dynamic  range  since  / (w)  can  tend  to  oo  as  u;  — »  0;  an  example 
is  f{u})  ~  (logw)^,  p(u)  ~  (logv)/v  as  V  — >  oo. 

A  traditional  parametrization  of  stationary  long  memory  time  series  is  5  =  2H  —  1, 
where  H  is  the  Hurst  index  satisfying  .5  <  H  <  1]  Hurst  estimated  /T  =  .7  for  the  Nile 
water  level  time  series.  The  covariance  function  has  the  asymptotic  representation 


R{v)  decays  slowly  like  ^ 


24 


The  memory  index  delta  plays  an  important  theoretical  role.  In  many  time  series 
theorems  the  asymptotic  behavior  of  a  statistic  is  expressed  in  terms  of  /(O),  the  value  at 
zero  frequency  of  the  spectral  density  function.  These  results  often  have  analogies  for  long 
memory  time  series  if  one  replaces  /(O)  by  /(l/n)  =  asymptotically;  compare  Samarov 
and  Taqqu  (1988). 

Estimation  of  delta  can  be  considered  estimating  a  “fractal  dimension” ,  the  exponent 
of  the  rate  of  growth  of  the  mean  of  the  sample  spectral  density.  Values  of  delta  are  used 
to  describe  music  and  how  the  brain  works!  U.  S.  News  and  World  Report,  June  11,  1990, 
p.  62  writes:  “Surprisingly,  the  same  mathematical  formula  that  characterizes  the  ebb 
and  flow  of  music  has  been  discovered  to  exist  widely  in  nature,  from  the  flow  of  the  Nile 
to  the  beating  of  the  human  heart  to  the  wobbling  of  the  earth’s  axis.  Remarkably,  this 
equation  is  closely  related  to  other  mathematical  formulas  used  by  computer  experts  to 
generate  amazingly  realistic  pictures  of  coastlines,  clouds  and  mountain  ranges  and  other 
natural  scenery.” 

Estimating  delta  from  data  has  many  of  the  same  difficulties  as  estimating  the  tail 
index  of  a  probability  distribution.  Since  delta  is  a  property  of  a  long  memory  time  series, 
it  undoubtedly  can  not  be  estimated  with  great  accuracy  from  relatively  short  lengths  of 
observed  time  series. 

We  would  appreciate  references  to  research  about  delta,  especially  the  conjectures  in 
this  section. 

10.  Sample  Brownian  Bridge  Exploratory  analysis  of  time  series. 

To  a  time  series  sample  {V(f),f  =  1,2, ...,n}  one  can  associate  functions  <r(u),  0  < 
u  <  1,  and 

D-(u)  =  r 

Jo 

satisfying  D''{l)  =  0.  Let  pn  s^nd  On  denote  respectively  the  sample  mean  and  sample 
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standard  deviation.  Define,  for  j  =  1, . . .  ,n, 

<r{u)  =  y/n{Y{j)  -  Hn) /On,  - - -<U<^ 

n  n 

Note  that  for  k  =  1, . . . ,  n 

We  call  D~{’)  the  sample  Brownian  Bridge  of  an  observed  time  series.  We  propose 
that  a  time  series  analysis  should  routinely  examine  the  graph  of  D~{u),  0  <  u  <  1;  one 
can  show  by  examples  that  it  provides  graphical  tools  of  identification  of  various  types  of 
long  memory  time  series. 

Theorem:  For  a  stationary  time  series  with  bounded  memory 

{D“'(u),0  <  u  <  1}  converges  in  distribution  to  |/■^(0)B(u),0  <  u  <  l| , 

where  B{u),  0  <  u  <  1,  is  a  Brownian  Bridge  stochastic  process  and  /(O)  is  the  spectral 
density  at  zero  frequency. 

For  long  memory  time  series  we  would  like  to  understand  the  aisymptotic  behavior  of 

{/(1/n)}”'®  £>”(tt),  0  <  u  <  1. 

Simulation  and  Time  Series:  Note  that  similar  processes  are  studied  by  researchers 
(Schruben,  Iglehart)  in  operations  research  departments  who  study  simulation  methods  of 
forming  confidence  intervals  for  p,  the  true  mean  of  y’(-);  they  standardize  D~{u)  by  its 
maximum  minus  its  minimum. 

Quality  Control  and  Time  Series:  The  process  /?”(•)  also  has  applications  to  quality 
control  problems  of  identifying  departures  from  the  null  hypothesis  that  V  (•)  is  white  noise. 
Components  (linear  functionals)  of  <r(-)  are  related  to  accumulation  analysis  methods  of 
Taguchi. 
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